Optimal Checkpointing Strategies for Iterative Applications

نویسندگان

چکیده

This work provides an optimal checkpointing strategy to protect iterative applications from fail-stop errors. We consider a general framework, where the application repeats same execution pattern by executing consecutive iterations, and each iteration is composed of several tasks. These tasks have different lengths checkpoint costs. Assume that there are n task i , 0 ? i <; has time t cost c . A naive would after task. Another at end iteration. inspired Young/Daly formula for ?{2 ?c xmlns:xlink="http://www.w3.org/1999/xlink">ave } seconds, ? MTBF average time, current (and repeat). strategy, also formula, select xmlns:xlink="http://www.w3.org/1999/xlink">min with smallest every p th instance task, leading period T, T = ? xmlns:xlink="http://www.w3.org/1999/xlink">i=0 xmlns:xlink="http://www.w3.org/1999/xlink">n-1 per One choose so ? obey formula. All these strategies suboptimal. Our main contribution show globally periodic, design dynamic programming algorithm computes pattern. may well many tasks, this across iterations. through simulations, both synthetic real-life scenarios, outperforms strategies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Sampling Strategies for Oceanic Applications

We have developed a method for optimal array design and applied it to a suite of applications, including the design of a surface mooring array in the tropical Indian Ocean (Sakov and Oke 2007). The method builds on the work of Bishop et al. (2001), using data assimilation theory to determine the observation locations that best constrain a data assimilating ocean model. The method seeks to ident...

متن کامل

Multigrid and Iterative Strategies for Optimal Control Problems

In this minisymposium we focus on optimal control problems, which constitute an important class of PDE-constrained optimization problems. There are many PDEs which can act as the constraints within the problem, such as Stokes-type equations, PDEs with a time-dependent component, and many others – consequently there is considerable potential for applications in applied sciences. One of the major...

متن کامل

Checkpointing Strategies for Scheduling Computational Workflows

We study the scheduling of computational workflows on compute resources that experience exponentially distributed failures. When a failure occurs, rollback and recovery is used to resume the execution from the last checkpointed state. The scheduling problem is to minimize the expected execution time by deciding in which order to execute the tasks in the workflow and deciding for each task wheth...

متن کامل

Checkpointing Strategies for Scheduling Computational Workflows Guillaume

متن کامل

Checkpointing and Its Applications

This paper describes our experience with the implementation and applications of the Unix checkpointing library libckp, and identifies two concepts that have proven to be the key to making checkpointing a powerful tool. First, including all persistent state, i.e., user files, as part of the process state that can be checkpointed and recovered provides a truly transparent and consistent rollback....

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems

سال: 2022

ISSN: ['1045-9219', '1558-2183', '2161-9883']

DOI: https://doi.org/10.1109/tpds.2021.3099440